Picture for Candace Ross

Candace Ross

Beg to Differ: Understanding Reasoning-Answer Misalignment Across Languages

Add code
Dec 27, 2025
Viaarxiv icon

What's in Common? Multimodal Models Hallucinate When Reasoning Across Scenes

Add code
Nov 05, 2025
Viaarxiv icon

A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs

Add code
Jun 11, 2025
Figure 1 for A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Figure 2 for A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Figure 3 for A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Figure 4 for A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Viaarxiv icon

DIMCIM: A Quantitative Evaluation Framework for Default-mode Diversity and Generalization in Text-to-Image Generative Models

Add code
Jun 05, 2025
Viaarxiv icon

Multi-Modal Language Models as Text-to-Image Model Evaluators

Add code
May 01, 2025
Viaarxiv icon

EvalGIM: A Library for Evaluating Generative Image Models

Add code
Dec 18, 2024
Figure 1 for EvalGIM: A Library for Evaluating Generative Image Models
Figure 2 for EvalGIM: A Library for Evaluating Generative Image Models
Figure 3 for EvalGIM: A Library for Evaluating Generative Image Models
Figure 4 for EvalGIM: A Library for Evaluating Generative Image Models
Viaarxiv icon

What makes a good metric? Evaluating automatic metrics for text-to-image consistency

Add code
Dec 18, 2024
Figure 1 for What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Figure 2 for What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Figure 3 for What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Figure 4 for What makes a good metric? Evaluating automatic metrics for text-to-image consistency
Viaarxiv icon

Findings of the Second BabyLM Challenge: Sample-Efficient Pretraining on Developmentally Plausible Corpora

Add code
Dec 06, 2024
Viaarxiv icon

Improving Model Evaluation using SMART Filtering of Benchmark Datasets

Add code
Oct 26, 2024
Figure 1 for Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Figure 2 for Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Figure 3 for Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Figure 4 for Improving Model Evaluation using SMART Filtering of Benchmark Datasets
Viaarxiv icon

Changing Answer Order Can Decrease MMLU Accuracy

Add code
Jun 27, 2024
Figure 1 for Changing Answer Order Can Decrease MMLU Accuracy
Figure 2 for Changing Answer Order Can Decrease MMLU Accuracy
Figure 3 for Changing Answer Order Can Decrease MMLU Accuracy
Figure 4 for Changing Answer Order Can Decrease MMLU Accuracy
Viaarxiv icon